Model Selection

Lightweight Vision-Language Model

# Lightweight Vision-Language Model

Smoldocling 256M Preview Mlx Bf16 Docling Snap

This is a 256M-parameter preview version of a document understanding model, specifically designed for document structure parsing and content extraction tasks, supporting the conversion of image documents into structured data.

Transformers English

Qwen2.5 VL 7B Captioner Relaxed GGUF

Qwen2.5-VL-7B-Captioner-Relaxed is a multimodal vision-language model based on the Qwen2.5 architecture, focusing on image-to-text generation tasks.

Image-to-Text English

Smolvlm2 500M Video Instruct Mlx 8bit Skip Vision

MLX format model converted from SmolVLM2-500M-Video-Instruct, supporting video-to-text tasks

Transformers English

Smolvlm2 500M Video Instruct Mlx

This is a video-text-to-text model based on the MLX format, developed by HuggingFaceTB, supporting English language processing.

Transformers English

Llavaguard V1.2 0.5B OV

LlavaGuard is a safety evaluation guardian based on vision-language models, primarily used for safety classification and violation detection of image content.

Doubutsu 2b Pt 756

Doubutsu is a lightweight vision-language model series, specifically designed for customized scenario fine-tuning.

Transformers English

Cerule is a lightweight yet powerful vision-language model built on Google's Gemma-2b and SigLIP, focusing on image-text processing.

Transformers English

UForm-Gen2-dpo is a small generative vision-language model, aligned for image caption generation and visual question answering tasks through Direct Preference Optimization (DPO) on VLFeedback and LLaVA-Human-Preference-10K preference datasets.

Transformers English

Moondream Prompt

A fine-tuned version of Moondream2, optimized for image prompt generation. It is a lightweight vision-language model suitable for efficient operation on edge devices.

LLaVa-Phi-2-3B is an open-source multimodal chatbot model, fine-tuned based on the Phi-2 architecture, capable of processing image and text inputs to generate natural language responses.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase